Identifying Predictive Collocations
نویسندگان
چکیده
Idioms and common multi-word expressions are often argued to be stored as chunks of words or fixed configurations in the mind, and to therefore be accessed faster and interpreted more easily than fully compositional word combinations. Experimental research has furthermore shown that a specific “recognition point” can be identified in such expressions, at which enough information is present to access the meaning of the whole expression and predict the remaining words of the collocation. In this paper, we suggest measures for automatically identifying those multi-word expressions where the first part is particularly predictive of the rest, and evaluate our measures against human association data collected in a cloze test.
منابع مشابه
A Corpus-based Analysis of Collocational Errors in the Iranian EFL Learners' Oral Production
Collocations are one of the areas generally considered problematic for EFL learners. Iranian learners of English like other EFL learners face various problems in producing oral collocations. An analysis of learners' spoken interlanguage both indicates the scope of the problem and the necessity to spend more time and energy by learners on mastering collocations. The present study specifically f...
متن کاملIdentifying collocations using cross-lingual association measures
We introduce a simple and effective crosslingual approach to identifying collocations. This approach is based on the observation that true collocations, which cannot be translated word for word, will exhibit very different association scores before and after literal translation. Our experiments in Japanese demonstrate that our cross-lingual association measure can successfully exploit the combi...
متن کاملCollocational Translation Memory Extraction Based on Statistical and Linguistic Information
In this paper, we propose a new method for extracting bilingual collocations from a parallel corpus to provide phrasal translation memories. The method integrates statistical and linguistic information to achieve effective extraction of bilingual collocations. The linguistic information includes parts of speech, chunks, and clauses. The method involves first obtaining an extended list of Englis...
متن کاملExtracting collocations and their translations from parallel corpora
Identifying collocations in a text (e.g., break record) and correctly translating them (battre record vs. *casser record) represent key issues in machine translation, notably because of their prevalence in language and their syntactic flexibility. This article describes a method for discovering translation equivalents for collocations from parallel corpora, aimed at increasing the lexical cover...
متن کاملINFO256 Project Report Implementation and Evaluation of Xtract in WordSeer
Natural languages are full of word collocations that frequently co-occur and correspond to arbitrary word usages. They appear in both technical and non-technical textual corpora and often have specific significance in individual contexts. Accurately retrieving and identifying collocations from a given corpus in an unsupervised manner is imperative to understanding and automatically generating t...
متن کامل